No matter the expressive power and sophistication of supervised learningalgorithms, their effectiveness is restricted by the features describing thedata. This is not a new insight in ML and many methods for feature selection,transformation, and construction have been developed. But while this ison-going for general techniques for feature selection and transformation, i.e.dimensionality reduction, work on feature construction, i.e. enriching thedata, is by now mainly the domain of image, particularly character,recognition, and NLP. In this work, we propose a new general framework for feature construction.The need for feature construction in a data set is indicated by class outliersand discriminative pattern mining used to derive features on theirk-neighborhoods. We instantiate the framework with LOF and C4.5-Rules, andevaluate the usefulness of the derived features on a diverse collection of UCIdata sets. The derived features are more often useful than ones derived byDC-Fringe, and our approach is much less likely to overfit. But while a weaklearner, Naive Bayes, benefits strongly from the feature construction, theeffect is less pronounced for C4.5, and almost vanishes for an SVM leaner. Keywords: feature construction, classification, outlier detection
展开▼